Multi-value Classification of Very Short Texts

نویسندگان

  • Andreas Heß
  • Philipp Dopichaj
  • Christian Maaß
چکیده

We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we apply a Rocchio classifier on a dataset from a Web 2.0 site and show that it is suitable for semi-automated labelling (often called tagging) of short texts and is faster than other approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents

Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, it is still a challenge task to model long texts, such as sentences and documents. In this paper, we propose a multi-timescale long short-termmemory (MT-LSTM) neural network to model long texts. MTLSTM partitions the hidden states of the standard LSTM into several groups. Each g...

متن کامل

Hand Gestures Classification with Multi-Core DTW

Classifications of several gesture types are very helpful in several applications. This paper tries to address fast classifications of hand gestures using DTW over multi-core simple processors. We presented a methodology to distribute templates over multi-cores and then allow parallel execution of the classification. The results were presented to voting algorithm in which the majority vote was ...

متن کامل

Author identification: Using text sampling to handle the class imbalance problem

Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorization problem. Very often, there are extremely few training texts at least for some of the candidate authors or there is a significant variation in the text-length among the available training texts of the candidate author...

متن کامل

A Comparison of Language Identification Approaches on Short, Query-Style Texts

In a multi-language Information Retrieval setting, the knowledge about the language of a user query is important for further processing. Hence, we compare the performance of some typical approaches for language detection on very short, query-style texts. The results show that already for single words an accuracy of more than 80% can be achieved, for slightly longer texts we even observed accura...

متن کامل

VIPE: A new interactive classification framework for large sets of short texts - application to opinion mining

This paper presents a new interactive opinion mining tool that helps users to classify large sets of short texts originated from Web opinion polls, technical forums or Twitter. From a manual multilabel pre-classification of a very limited text subset, a learning algorithm predicts the labels of the remaining texts of the corpus and the texts most likely associated to a selected label. Using a f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008